Goto

Collaborating Authors

 Kandahar Province


Predator drones shift from border patrol to protest surveillance

Los Angeles Times

Things to Do in L.A. Tap to enable a layout that focuses on the article. An unmanned Predator drone flies over Kandahar Air Field in southern Afghanistan in 2010. This is read by an automated voice. Please report any issues or inconsistencies here . MQ-9 Predator drones were deployed over Los Angeles to monitor anti-ICE protests in June.


What Lt. Col. Boz and Big Tech's Enlisted Execs Will Do in the Army

WIRED

When I read a tweet about four noted Silicon Valley executives being inducted into a special detachment of the United States Army Reserve, including Meta CTO Andrew "Boz" Bosworth, I questioned its veracity. It's very hard to discern truth from satire in 2025, in part because of social media sites owned by Bosworth's company. But it indeed was true. Boz is now Lieutenant Colonel Bosworth. The other newly commissioned officers include Kevin Weil, OpenAI's head of product; Bob McGrew, a former OpenAI head of research now advising Mira Murati's company Thinking Machines Lab; and Shyam Sankar, the CTO of Palantir.


Contemporary AI foundation models increase biological weapons risk

Brent, Roger, McKelvey, T. Greg Jr

arXiv.org Artificial Intelligence

The rapid advancement of artificial intelligence has raised concerns about its potential to facilitate biological weapons development. We argue existing safety assessments of contemporary foundation AI models underestimate this risk, largely due to flawed assumptions and inadequate evaluation methods. First, assessments mistakenly assume biological weapons development requires tacit knowledge, or skills gained through hands-on experience that cannot be easily verbalized. Second, they rely on imperfect benchmarks that overlook how AI can uplift both nonexperts and already-skilled individuals. To challenge the tacit knowledge assumption, we examine cases where individuals without formal expertise, including a 2011 Norwegian ultranationalist who synthesized explosives, successfully carried out complex technical tasks. We also review efforts to document pathogen construction processes, highlighting how such tasks can be conveyed in text. We identify "elements of success" for biological weapons development that large language models can describe in words, including steps such as acquiring materials and performing technical procedures. Applying this framework, we find that advanced AI models Llama 3.1 405B, ChatGPT-4o, and Claude 3.5 Sonnet can accurately guide users through the recovery of live poliovirus from commercially obtained synthetic DNA, challenging recent claims that current models pose minimal biosecurity risk. We advocate for improved benchmarks, while acknowledging the window for meaningful implementation may have already closed.


Fine-grained Hallucination Detection and Editing for Language Models

Mishra, Abhika, Asai, Akari, Balachandran, Vidhisha, Wang, Yizhong, Neubig, Graham, Tsvetkov, Yulia, Hajishirzi, Hannaneh

arXiv.org Artificial Intelligence

Large language models (LMs) are prone to generate diverse factually incorrect statements, which are widely called hallucinations. Current approaches predominantly focus on coarse-grained automatic hallucination detection or editing, overlooking nuanced error levels. In this paper, we propose a novel task -- automatic fine-grained hallucination detection -- and present a comprehensive taxonomy encompassing six hierarchically defined types of hallucination. To facilitate evaluation, we introduce a new benchmark that includes fine-grained human judgments on two LM outputs across various domains. Our analysis reveals that ChatGPT and Llama 2-Chat exhibit hallucinations in 60% and 75% of their outputs, respectively, and a majority of these hallucinations fall into categories that have been underexplored. As an initial step to address this, we train FAVA, a retrieval-augmented LM by carefully designing synthetic data generations to detect and correct fine-grained hallucinations. On our benchmark, our automatic and human evaluations show that FAVA significantly outperforms ChatGPT on fine-grained hallucination detection by a large margin though a large room for future improvement still exists. FAVA's suggested edits also improve the factuality of LM-generated text, resulting in 5-10% FActScore improvements.


"Merge Conflicts!" Exploring the Impacts of External Distractors to Parametric Knowledge Graphs

Qian, Cheng, Zhao, Xinran, Wu, Sherry Tongshuang

arXiv.org Artificial Intelligence

Large language models (LLMs) acquire extensive knowledge during pre-training, known as their parametric knowledge. However, in order to remain up-to-date and align with human instructions, LLMs inevitably require external knowledge during their interactions with users. This raises a crucial question: How will LLMs respond when external knowledge interferes with their parametric knowledge? To investigate this question, we propose a framework that systematically elicits LLM parametric knowledge and introduces external knowledge. Specifically, we uncover the impacts by constructing a parametric knowledge graph to reveal the different knowledge structures of LLMs, and introduce external knowledge through distractors of varying degrees, methods, positions, and formats. Our experiments on both black-box and open-source models demonstrate that LLMs tend to produce responses that deviate from their parametric knowledge, particularly when they encounter direct conflicts or confounding changes of information within detailed contexts. We also find that while LLMs are sensitive to the veracity of external knowledge, they can still be distracted by unrelated information. These findings highlight the risk of hallucination when integrating external knowledge, even indirectly, during interactions with current LLMs. All the data and results are publicly available.


The changing face of modern warfare: How 'cheap' drones are moving the Ukraine war from the trenches to city skyscrapers - and could be pivotal in Kyiv's fight to defeat Putin

Daily Mail - Science & tech

Ukraine has warned Vladimir Putin that more drone attacks coming -- just hours after a flying bot smashed into one of Moscow's skyscrapers for the second time in as many days. Although Kyiv refuses to officially take responsibility for such attacks inside Russia, this latest skirmish is considered to be part of a wider offensive aimed at shifting the focus of the conflict to the Kremlin's doorstep. Experts say the way Kyiv is looking to do this is with the help of drones in the air and by sea -- a'cheap', expendable technology which has been revolutionising modern warfare over the past two decades. It is certainly turning attention from the First World War-style trench warfare that has been raging throughout Ukraine since the conflict broke out - and there's a reason the rest of the world is watching. Here, MailOnline looks at how drones are changing the face of future conflict, and why Ukraine is ratcheting up the use of them in an attempt to win the propaganda war and turn the tide of Putin's invasion.


OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization

Iyer, Srinivasan, Lin, Xi Victoria, Pasunuru, Ramakanth, Mihaylov, Todor, Simig, Daniel, Yu, Ping, Shuster, Kurt, Wang, Tianlu, Liu, Qing, Koura, Punit Singh, Li, Xian, O'Horo, Brian, Pereyra, Gabriel, Wang, Jeff, Dewan, Christopher, Celikyilmaz, Asli, Zettlemoyer, Luke, Stoyanov, Ves

arXiv.org Artificial Intelligence

Recent work has shown that fine-tuning large pre-trained language models on a collection of tasks described via instructions, a.k.a. instruction-tuning, improves their zero and few-shot generalization to unseen tasks. However, there is a limited understanding of the performance trade-offs of different decisions made during the instruction-tuning process. These decisions include the scale and diversity of the instruction-tuning benchmark, different task sampling strategies, fine-tuning with and without demonstrations, training using specialized datasets for reasoning and dialogue, and finally, the fine-tuning objectives themselves. In this paper, we characterize the effect of instruction-tuning decisions on downstream task performance when scaling both model and benchmark sizes. To this end, we create OPT-IML Bench: a large benchmark for Instruction Meta-Learning (IML) of 2000 NLP tasks consolidated into task categories from 8 existing benchmarks, and prepare an evaluation framework to measure three types of model generalizations: to tasks from fully held-out categories, to held-out tasks from seen categories, and to held-out instances from seen tasks. Through the lens of this framework, we first present insights about instruction-tuning decisions as applied to OPT-30B and further exploit these insights to train OPT-IML 30B and 175B, which are instruction-tuned versions of OPT. OPT-IML demonstrates all three generalization abilities at both scales on four different evaluation benchmarks with diverse tasks and input formats -- PromptSource, FLAN, Super-NaturalInstructions, and UnifiedSKG. Not only does it significantly outperform OPT on all benchmarks but is also highly competitive with existing models fine-tuned on each specific benchmark. We release OPT-IML at both scales, together with the OPT-IML Bench evaluation framework.


Read, Revise, Repeat: A System Demonstration for Human-in-the-loop Iterative Text Revision

Du, Wanyu, Kim, Zae Myung, Raheja, Vipul, Kumar, Dhruv, Kang, Dongyeop

arXiv.org Artificial Intelligence

Revision is an essential part of the human writing process. It tends to be strategic, adaptive, and, more importantly, iterative in nature. Despite the success of large language models on text revision tasks, they are limited to non-iterative, one-shot revisions. Examining and evaluating the capability of large language models for making continuous revisions and collaborating with human writers is a critical step towards building effective writing assistants. In this work, we present a human-in-the-loop iterative text revision system, Read, Revise, Repeat (R3), which aims at achieving high quality text revisions with minimal human efforts by reading model-generated revisions and user feedbacks, revising documents, and repeating human-machine interactions. In R3, a text revision model provides text editing suggestions for human writers, who can accept or reject the suggested edits. The accepted edits are then incorporated into the model for the next iteration of document revision. Writers can therefore revise documents iteratively by interacting with the system and simply accepting/rejecting its suggested edits until the text revision model stops making further revisions or reaches a predefined maximum number of revisions. Empirical experiments show that R3 can generate revisions with comparable acceptance rate to human writers at early revision depths, and the human-machine interaction can get higher quality revisions with fewer iterations and edits. The collected human-model interaction dataset and system code are available at \url{https://github.com/vipulraheja/IteraTeR}. Our system demonstration is available at \url{https://youtu.be/lK08tIpEoaE}.


Edge.org

#artificialintelligence

The conversation is on hold. The Edge community has hit the road... or they're staying home. Preparing for the academic year to begin, wrapping up projects and starting new ones, celebrating with family and friends or contemplating in solitude. After a hiatus, Edge is pleased to revive Summer Postcards: Edgies reporting in from wherever they are and on whatever they're doing, as the dog days wind out and the season comes to a close. As the world slowly returns to a "new normal" with enduring COVID restrictions in the midst of renewed vaccine freedoms, this year's collection is a testament to change (temporary and lasting), a consideration of loss (will travel ever be like it was?), and a celebration of questions (that still need answering). The hammock may be away until next year, but the memories remain. I spent the summer writing and revising the final section of a longish novel I started in 2019. It seems now as though I've been from 1946 to 2021 on my hands and knees. Various lockdowns have been a liberation from obligations and the luggage carousel, and I've never known such sweet and total focus for months on end. We have the luxury of living in the country--no shortage of big skies and moody walks. All our few breaks were in the UK--Scotland, the Lake District, the West country. Even in our remote part of the Lakes, I had to keep on writing--as in photo. The best novel I read this summer was Sandro Veronesi's The Hummingbird. Best non-fiction was Peter Godfrey Smith's Metazoa: Animal Life and the Birth of the Mind. I gave time also to some wonderful novellas--perfect fictional form for you too-busy scientists. IAN MCEWAN is a novelist whose works have earned him worldwide critical acclaim. He is the recipient of the Man Booker Prize for Amsterdam (1998), the National Book Critics' Circle Fiction Award, and the Los Angeles Times Prize for Fiction for Atonement (2003). His most recent novel is Machines Like Me. In 2019, Časlav Brukner and myself were walking on a beach on Lamma Island, near Hong Kong, marvelling together at the astonishing strangeness of quantum phenomena. This summer, the conversation with Časlav has continued on another island, and quite an island: Lesbos, the northern Greek island near the Turkish coast. Lesbos is the place where lyrical poetry was born. Here lived Sappho and Alcaeus.


Hidden Pentagon records reveal patterns of failure in deadly U.S. airstrikes

The Japan Times

Shortly before 3 a.m. on July 19, 2016, U.S. Special Operations forces bombed what they believed were three Islamic State (IS) group "staging areas" on the outskirts of Tokhar, a riverside hamlet in northern Syria. They reported 85 fighters killed. In fact, they hit houses far from the front line, where farmers, their families and other local people sought nighttime sanctuary from bombing and gunfire. More than 120 villagers were killed. In early 2017 in Iraq, an American war plane struck a dark-colored vehicle, believed to be a car bomb, stopped at an intersection in the Wadi Hajar neighborhood of West Mosul. Actually, the car had been bearing not a bomb but a man named Majid Mahmoud Ahmed, his wife and their two children, who were fleeing the fighting nearby. They and three other civilians were killed. In November 2015, after observing a man dragging an "unknown heavy object" into an IS "defensive fighting position," U.S. forces struck a building in Ramadi, Iraq. A military review found that the object was actually "a person of small stature" -- a child -- who died in the strike. None of these deadly failures resulted in a finding of wrongdoing. These cases are drawn from a hidden Pentagon archive of the American air war in the Middle East since 2014. The trove of documents -- the military's own confidential assessments of more than 1,300 reports of civilian casualties, obtained by The New York Times -- lays bare how the air war has been marked by deeply flawed intelligence, rushed and often imprecise targeting and the deaths of thousands of civilians, many of them children, a sharp contrast to the U.S. government's image of war waged by all-seeing drones and precision bombs. The documents show, too, that despite the Pentagon's highly codified system for examining civilian casualties, pledges of transparency and accountability have given way to opacity and impunity. In only a handful of cases were the assessments made public. Not a single record provided includes a finding of wrongdoing or disciplinary action. Fewer than a dozen condolence payments were made, even though many survivors were left with disabilities requiring expensive medical care. Documented efforts to identify root causes or lessons learned are rare. The air campaign represents a fundamental transformation of warfare that took shape in the final years of the Obama administration, amid the deepening unpopularity of the forever wars that had claimed more than 6,000 American service members. The United States traded many of its boots on the ground for an arsenal of aircraft directed by controllers sitting at computers, often thousands of kilometers away. President Barack Obama called it "the most precise air campaign in history." This was the promise: America's "extraordinary technology" would allow the military to kill the right people while taking the greatest possible care not to harm the wrong ones. The IS caliphate ultimately crumbled under the weight of American bombing.